15 research outputs found

    Task migration of DSP application specified with a DFG and implemented with the BSP computing model on a CPU-GPU cluster

    No full text
    International audienceNowadays computer applications are becoming heavier and require, at the same time, real-time results. The Heterogeneous clusters with their computing power represent a good solution to this request. However, it is possible that during the execution, a computing element of the cluster becomes defaulting, needs maintenance, or that the load needs to be re-balanced. . . In this paper, we propose a migration strategy for relocating the execution of a task to another computing element. In particular, we are interested in remap nodes of Data Flow Graph (DFG), representing Digital Signal Processing (DSP) application, onto heterogeneous (CPU-GPU) clusters while keeping up the flow of data and minimizing the temporal perturbation. For our approach, we give a lower bound for the flow of data after the migration and, validate it by the real-time construction of visual saliency map from video input

    Automatic lip tracking: Bayesian segmentation and active contours in a cooperative scheme

    No full text
    International audienceAn algorithm for speaker's lip contour extraction is pre- sented in this paper. A color video sequence of speaker's face is acquired, under natural lighting conditions and without any particular make-up. First, a logarithmic color transform is performed from RGB to HI (hue, intensity) color space. A bayesian approach segments the mouth area using Markov random field modelling. Motion is combined with red hue lip information into a spatiotemporal neighbourhood. Simultaneously, a Region Of Interest and relevant boundaries points are automatically extracted. Next, an active contour using spatially varying coefficients is initialised with the results of the preprocessing stage. Finally, an accurate lip shape with inner and outer borders is obtained with good quality results in this challenging situation

    Forme optimale de mire pour le calibrage de cameras video

    No full text
    Pour obtenir un système de mesure à l'aide de caméras vidéo ayant une précision maximale, il est nécessaire d'étudier la mire de calibrage dans le contexte de la chaîne d'acquisition. Des algorithmes spécialisés dans le repérage des points de référence, sont choisis en liaison avec le motif qui compose la mire. En pratique, les motifs en carré ou en disque donnent des résultats semblables, principalement à cause de leur grande dimension. Nous détaillons les mesures réalisées avec ces mires. L'optimisation de la localisation des points de référence sur l'image peut être envisagée avec un motif en damier, par un repérage local, au voisinage du coin. Un algorithme de détection « subpixel » est proposé dans ce cas

    DFG implementation on multi GPU cluster with computation-communication overlap

    No full text
    International audienceNowadays, computers embed many CPUs and at least one GPU. Workstations can host several GPU cards, which are well suited for scientific and engineering computations. Such computers are linked through high bandwidth networks to compose clusters for HPC. These machines provide highly parallel multicore architectures while being cost-effective. Moreover, they significantly reduce dissipated power, and space needs compared to classical HPC clusters. Recently NVIDIA or ATI announced Tesla or Firestream boards, performing more than 500 gigaflops of double precision performance and dissipating less than 250 W for single GPU board. However, the real challenge is to achieve the highest performances on muti-GPU architectures. The programmer has to design architecture-specific code including GPU communications and memory management, task scheduling and synchronization. So, a high level programming abstract model is required to express all these important operations. In this paper, we propose a design flow allowing an efficient implementation of a DSP application specified as a DFG on a multi GPU computer cluster. We focus particularly on the effective implementation of communications by automating the computation-communication overlap. After presenting the related work, we show the interest of the implementation of communication-computation overlap on multi-GPU architectures. Then, we present our design flow that allows an efficient implementation of an algorithm expressed as DFG on a multi-GPU architecture. Finally, it is applied on a real world application of 3D granulometry developed for research on materials

    FPGA implementation of a real time multi-resolution edge detection video filter

    No full text
    4 pagesInternational audienceThis paper presents digital video filters labs for final year engineering students. The project deals with the implementation of Canny Deriche optimal edge detectors on a FPGA plateform. The target of these labs is to illustrate the design of integrated electronic systems and to introduce the concept of architecture/algorithm adequacy

    Efficient implementation of data flow graphs on multi-gpu clusters

    No full text
    International audienceNowadays, it is possible to build a multi-GPU supercomputer, well suited for implementation of digital signal processing algorithms, for a few thousand dollars. However, to achieve the highest performance with this kind of architecture, the programmer has to focus on inter-processor communications, tasks synchronization. In this paper, we propose a high level programming model based on a data flow graph (DFG) allowing an efficient implementation of digital signal processing applications on a multi-GPU computer cluster. This DFG-based design flow abstracts the underlying architecture. We focus particularly on the efficient implementation of communications by automating computation-communication overlap, which can lead to significant speedups as shown in the presented benchmark. The approach is validated on three experiments: a multi-host multi-gpu benchmark, a 3D granulometry application developed for research on materials and an application for computing visual saliency maps

    Investigating performance variations of an optimized GPU-ported granulometry algorithm

    No full text
    International audienceIn this article, we present an optimized GPU implementation of a granulometry algorithm which is used a lot in the study of material domain. The main contribution to this algorithm is the binarization of the input data which increases throughput while reducing data allocated memory space. Also, the optimized GPU implementation brings an order of magnitude speedup compared to a CPU multi-threaded implementation. Furthermore, we investigate the reasons why GPU performance drop for different input data dimensions. Three main factors are exposed: under-exploited threads, threadblocks and streaming multiprocessors. This study should help the reader understand the tight relation that exists between the CUDA programming paradigm and the gpu architecture as well as some main bottlenecks

    Conception conjointe logiciel-matériel et microprocesseur embarqué, validation sur plateforme FPGA

    No full text
    5 pagesInternational audienceDans le cadre d'une initiation aux systèmes électroniques intégrés, nous proposons un bureau d'étude de découverte d'un processeur embarqué au coeur d'une chaîne de traitement numérique du signal. A l'aide d'une description VHDL du processeur élémentaire fournie aux étudiants, il est proposé de simuler l'exécution de programmes en assembleur et de mettre en oeuvre le flot de conception d'un SOC. Cet enseignement a été apprécié car il permet d'illustrer le fonctionnement du coeur du processeur tout en validant la conception du matériel et du logiciel sur une carte FPGA
    corecore